GRANT  NUMBER  DAMD17-94-J-4332 


TITLE:  Statistical  Methods  for  Analyzing  Time -Dependent  Events 
in  Breast  Cancer  Chemopre vent ion  Studies 


PRINCIPAL  INVESTIGATOR:  George  Y.C.  Wong,  Ph.D. 

CONTRACTING  ORGANIZATION:  Strang-Cornell  Cancer  Research 

Laboratory 

New  York,  New  York  10021 

REPORT  DATE:  October  1996 


TYPE  OF  REPORT:  Annual 


PREPARED  FOR :  Commander 

U.S.  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick,  Frederick,  Maryland  21702-5012 


DISTRIBUTION  STATEMENT:  Approved  for  public  release; 

distribution  unlimited 


The  views,  opinions  and/or  findings  contained  in  this  report  are 
those  of  the  author (s)  and  should  not  be  construed  as  an  official 
Department  of  the  Army  position,  policy  or  decision  unless  so 
designated  by  other  documentation. 


l 


REPORT  DOCUMENTATION  PAGE 


Form  Approved 
OMB  No.  0704-0188 


Public  reporting  burden  for  this  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources 
gathering  and  maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this 
collection  of  information,  including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson 
Davis  Highway,  Suite  1204,  Arlington,  VA  22202-4302,  and  to  the  Office  of  Management  and  Budget,  Paperwork  Reduction  Project  (0704-0188),  Washington,  0C  20503 


1.  AGENCY  USE  ONLY  (Leave  blank) 


2.  REPORT  DATE 
October  1996 


3.  REPORT  TYPE  AND  DATES  COVERED 
Annual  (30  Sep  95  -  29  Sep  96) 


l.  TITLE  AND  SUBTITLE  5.  FUNDING  NUMBERS 

Statistical  Methods  for  Analyzing  Time-Dependent  Events 

in  Breast  Cancer  Chemopre vent ion  Studies  DAMD17-94- J-4332 


6.  AUTHOR(S) 


George  Y.C.  Wong,  Ph.D. 


7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Strang-Cornell  Cancer  Research  Laboratory 
New  York;  New  York  10021 


8.  PERFORMING  ORGANIZATION 
REPORT  NUMBER 


9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 
Commander 

U.  S .  Army  Medical  Research  and  Materiel  Command 
Fort  Detrick;  Frederick;  Maryland  21702-5012 


11.  SUPPLEMENTARY  NOTES 


10.  SPONSORING/MONITORING 
AGENCY  REPORT  NUMBER 


19970228  081  - 


12a.  DISTRIBUTION  /  AVAILABILITY  STATEMENT 


12b.  DISTRIBUTION  CODE 


Approved  for  public  release;  distribution  unlimited 


DTIC  QUALITY  INSPECTED  2 


13.  ABSTRACT  (Maximum  200 


The  overall  aim  of  our  research  proposal  is  the  statistical  inference  of  nonparametric  estimates, 
the  redistribution-to-the-inside  estimator  (RTIE)  and  the  generalized  maximum  likelihood 
estimator  (GMLE),  for  the  survival  function,  where  the  survival  time  is  subject  to  interval 
censoring.  The  GMLE  is  the  standard  optimal  procedure  in  survival  analysis.  However,  a  closed 
form  expression  for  this  estimator  has  not  been  derived,  and  the  asymptotic  distribution  theory 
for  it  has  been  very  little  known  (see  Groeneboom  and  Wellner  [1]).  In  our  original  proposal,  we 
created  the  RTIE,  which  has  a  closed  form  expression,  and  which  has  been  shown  by  us  to  be  the 
GMLE  under  certain  conditions.  Working  on  the  theory  of  the  RTIE  has  provided  us  with 
important  clues  to  the  asymptotic  theory  concerning  the  GMLE.  Our  research  efforts  in  the 
second  year  have  focused  on  attacking  the  asymptotic  distribution  of  the  GMLE  under  the 
assumption  that  the  censoring  random  vector  is  discrete.  Under  such  an  assumption,  we  have 
successfully  established  the  asymptotic  properties  of  the  GMLE  as  well  as  those  of  the  RTIE. 


14.  SUBJECT  TERMS 


Breast  Cancer,  Interval  Censorship,  Asymptotic  Normality  and 

Efficiency 


15.  NUMBER  OF  PAGES 

13 


16.  PRICE  CODE 


17.  SECURITY  CLASSIFICATION  18.  SECURITY  CLASSIFICATION  19.  SECURITY  CLASSIFICATION  20.  LIMITATION  OF  ABSTRACT 
OF  REPORT  OF  THIS  PAGE  OF  ABSTRACT 

Unclassified  Unclassified  Unclassified  Unlimited 


Unclassified 


NSN  7540*01-280-5500 


Unlimited 


Standard  Form  298  (fiov.  2-89) 

Prescribed  by  ANSI  Std.  2391 8 
298-102 


FOREWORD 


Opinions,  interpretations,  conclusions  and  recommendations  are 
those  of  the  author  and  are  not  necessarily  endorsed  by  the  U.S. 
Army. 


_  Where  copyrighted  material  is  quoted,  permission  has  been 

obtained  to  use  such  material. 

_  Where  material  from  documents  designated  for  limited 

distribution  is  quoted,  permission  has  been  obtained  to  use  the 
material. 

_  Citations  of  commercial  organizations  and  trade  names  in 

this  report  do  not  constitute  an  official  Department  of  Army 
endorsement  or  approval  of  the  products  or  services  of  these 
organizations . 

_  In  conducting  research  using  animals,  the  investigator (s) 

adhered  to  the  "Guide  for  the  Care  and  Use  of  Laboratory 
Animals,"  prepared  by  the  Committee  on  Care  and  use  of  Laboratory 
Animals  of  the  Institute  of  Laboratory  Resources,  national 
Research  Council  (NIH  Publication  No.  86-23,  Revised  1985) . 

_  For  the  protection  of  human  subjects,  the  investigator (s) 

adhered  to  policies  of  applicable  Federal  Law  45  CFR  46. 

.* 

_  In  conducting  research  utilizing  recombinant  DNA  technology, 

the  investigator (s)  adhered  to  current  guidelines  promulgated  by 
the  National  Institutes  of  Health. 

_  In  the  conduct  of  research  utilizing  recombinant  DNA,  the 

investigator (s)  adhered  to  the  NIH  Guidelines  for  Research 
Involving  Recombinant  DNA  Molecules . 

_  In  the  conduct  of  research  involving  hazardous  organisms, 

the  investigator (s)  adhered  to  the  CDC-NIH  Guide  for  Biosafety  in 
Microbiological  and  Biomedical  Laboratories. 


A.  TABLE  OF  CONTENTS 


page  number 


Front  Cover 

1 

Report  Documentation  Page 

2 

Foreword 

3 

A.  Table  of  Contents 

4 

B.  Introduction 

5-7 

C.  Body 

8-10 

D.  Conclusions 

10-11 

E.  References 

11 

F.  Appendices 

12-13 

4 


B.  INTRODUCTION 

In  clinical  follow-up  studies,  subjects  are  monitored  at  regular  time  intervals  for  a 
physical  condition.  It  is  often  the  case  that  an  event  under  observation  can  take  place  in 
between  two  successive  visits,  and  it  may  not  be  possible  for  the  subject  to  know  the  time 
to  such  an  event  exactly.  For  example,  consider  the  situation  in  which  a  group  of  women 
at  high  risk  for  breast  cancer  is  asked  to  take  a  chemopreventive  substance  for  a  fixed  time 
period.  At  the  end  of  the  period,  each  participating  woman  is  required  to  submit  a  blood 
or  urine  sample  at  regular  intervals  in  order  to  monitor  the  level  of  a  validated  intermediate 
biomarker.  Let  X  denote  the  time  from  cessation  of  use  of  the  agent  to  the  loss  of  its 
protective  effect,  quantified  as  a  return  to  baseline  value  of  the  biomarker.  If  a  woman 
submits  a  sample  for  assay  on  a  daily  basis,  the  value  of  X  can  be  observed  exactly,  unless 
the  protective  effect  is  still  present  by  the  time  the  study  is  terminated  so  that  X  is  right 
censored  in  the  usual  sense  of  survival  analysis.  In  practice,  however,  the  follow-up  interval 
can  be  a  week  or  longer;  therefore  the  exact  value  of  X  is  generally  unknown  but  is  known  to 
lie  between  the  time  points  L  and  R,  where  L  is  the  number  of  days  from  cessation  of  agent 
intake  to  the  last  time  the  sample  was  assayed  and  the  protective  effect  was  still  present,  and 
R  is  the  number  of  days  from  cessation  of  agent  intake  to  the  most  recent  time  the  sample 
was  assayed.  If  the  protective  effect  is  still  present,  then  R  takes  the  value  infinity.  In  any 
case,  when  the  value  of  X  is  only  known  to  lie  between  (L,  R),  we  say  that  X  is  censored  in 
the  interval  ( L,R ).  Therefore  the  observed  data  consist  of  either  censoring  intervals  ( L,R ) 
or  exact  observations  X  =  L  —  R. 

We  consider  nonparametric  estimation  of  the  distribution  function  F(t )  of  a  real- valued 
random  variable  X  (or  its  survival  function  S(t)  =  1  —F(t),  where  F(t)  =P{W  <  i}),  when 
the  sample  data  are  incomplete  due  to  restricted  observation  brought  about  by  interval 
censoring. 

At  present,  there  are  only  two  estimation  procedures  of  S  for  interval-censored  data  that 
are  generalized  maximum  likelihood  estimates  (GMLE)  in  the  sense  of  Kiefer  and  Wolfowitz 
[2].  The  first  one  is  due  to  Peto  [3]  and  makes  use  of  the  Newton- Ralphon  algorithm.  The 
second  is  due  to  Turnbull  [4]  and  makes  use  of  a  self-consistent  algorithm.  A  solution  to  the 
latter  algorithm  is  also  called  a  self-consistent  estimator  (SCE)  of  S.  In  each  case,  there  is 
no  closed  form  expression  for  the  estimator. 

In  the  first  year  of  our  research,  we  have  focused  our  attention  on  interval-censored 
data  that  satisfy  a  condition  which  we  call  DI  condition:  data  {L\,  Ri}, {Ln,Rn}  are 
said  to  satisfy  DI  condition  if  given  any  two  censoring  intervals,  ( L{ ,  Ri)  and  ( Lj ,  Rj),  either 
they  are  disjoint  or  one  is  a  subset  of  the  other.  In  a  clinical  study  in  which  every  subject 
has  the  same  follow-up  schedule,  say  at  time  point  Oi,  02,  ...,  a*,,  then  {L,R}  =  {0,ai},  or 
{ai,Uj+ 1}  or  {aj,oo},  and  hence  such  interval-censoring  data  will  satisfy  Condition  DI. 

Under  the  DI  interval-censorship  model,  we  have  extended  Efron’s  [5]  redistribution 
-to-the-right  idea  for  right-censored  data  and  proposed  a  redistribution-to-the-inside  (RTI) 
method  to  yield  a  nonparametric  estimator  of  S(t )  which  we  call  redistribution-to-the- 
inside  estimator  (RTIE).  Such  an  estimate  has  a  closed  form  expression  and  can  be  quickly 
calculated  for  interval-censored  data  of  any  size.  The  availability  of  an  explicit  expression 
for  the  RTIE  has  enabled  us  to  show  that  it  is  the  GMLE  under  the  DI  condition,  and  to 
establish  asymptotic  properties  of  the  RTIE. 
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More  often  than  not,  interval-censored  data  do  not  satisfy  the  DI  condition.  In  a  clinical 
follow-up  situation,  for  example,  a  patient  may  miss  a  particular  appointment.  Therefore, 
it  is  important  to  consider  asymptotic  inferences  under  a  more  general  condition  of  interval 
censorship.  Interval-censored  data  arise  also  quite  naturally  in  medical  follow-up  studies  or 
in  industrial  life-testing.  A  general  interval  censorship  model  can  be  described  as  follows: 
Suppose  the  survival  time  X  has  a  distribution  function  F.  What  we  really  observe  is  an 
interval  7,  possibly  a  singleton  set.  If  7  =  [X,  A],  we  have  an  exact  observation;  otherwise, 
we  only  know  that  X  lies  in  the  interval  7,  that  is,  the  observation  is  interval  censored.  An 
observation  is  called  right  censored  if  7  has  a  left  endpoint  oo,  left  censored  if  7  has  a  right 
endpoint  0,  exact  if  7  is  a  singleton  set  and  strictly  interval  censored  if  the  interval  7  is  none 
of  the  above. 

There  are  4  typical  situations  in  which  interval-censored  data  can  occur. 

Case  2  interval-censored  data  (C2  data)  consist  of  right-,  left-  and  strictly  interval- 
censored  but  not  exact  observations.  Finkelstein  and  Wolfe  [6]  presented  a  set  of  case  2 
interval-censored  data  in  comparing  two  different  treatments  for  breast  cancer  patients. 
The  censoring  intervals  (in  months)  arose  in  the  follow-up  studies  for  patients  treated  with 
radiotherapy  and  chemotherapy.  The  failure  time  is  the  time  until  cosmetic  deterioration, 
as  determined  by  the  appearance  of  breast  retraction. 

Partially  interval-censored  data  (PIC  data)  consist  of  C2  data  and  exact  observations. 
Yu,  Li  and  Wong  [7]  presented  a  set  of  PIC  data  as  follows. 

Example  1.  Three  hundred  and  seventy- four  women  with  stages  I  -  III  unilateral  invasive 
breast  cancer  surgically  treated  on  the  Breast  Service  of  Memorial  Sloan-Kettering  Cancer 
Center  between  1985  and  1990  were  followed  for  relapse.  The  median  follow-up  duration 
was  46  months.  Relapse  time  was  given  by  the  time  interval  between  surgery  and  the  initial 
relapse.  For  a  relapsed  patient  who  was  followed  closely  (for  instance,  during  the  initial 
follow-up  period  after  surgery),  an  exact  value  for  the  relapse  time  could  be  meaningfully 
assessed.  Otherwise,  a  relapse  time  between  two  successive  follow-up  visits  would  have  to 
be  regarded  as  interval  censored.  If  a  patient  did  not  relapse  towards  the  end  of  the  study, 
then  her  relapse  time  was  right  censored.  Of  the  374  relapse  times,  300  were  right  censored, 
53  were  interval  censored,  and  21  were  observed  exactly. 

Doubly-censored  data  (DC  data)  consist  of  right-,  left-censored  and  exact  observations. 
Examples  of  DC  data  can  be  found  in  [8]. 

Case  1  interval-censored  data  (Cl  data))  consist  of  right-censored  and  left-censored 
observations.  Examples  of  Cl  data  can  be  found  in  [9]  and  [10]. 

Four  different  interval  censorship  models  have  been  proposed  corresponding  to  the  four 
different  types  of  data.  They  are  the  C2  model,  the  mixture  interval  censorship  model  (MIC 
model),  the  DC  model  and  the  Cl  model.  Only  the  C2  and  the  MIC  models  involve  strictly 
interval-censored  observations. 

The  GMLE  for  interval-censored  data,  is  a  distribution  that  maximizes  the  likelihood 
function  (Kiefer  &  Wolfowitz  [2]).  The  GMLE  was  derived  via  a  numerical  method  by 
Peto  [3]  and  Turnbull  [4],  and  they  conjectured  that  the  GMLE  has  an  asymptotic  normal 
distribution.  However,  Groeneboom  and  Wellner  [1]  conjectured  that  it  does  not  have  the 
asymptotic  normal  distribution.  So  far,  the  asymptotic  distribution  of  the  GMLE  of  F 
has  not  been  established  for  data  involving  strictly  interval-censored  observations  (see,  e.g., 
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Groeneboom  and  Wellner  [1]).  Thus,  in  the  research  where  interval-censored  data  occur,  the 
current  practice  is  to  treat  the  strictly  interval-censored  data  as  right-censored  data  and  to 
apply  the  Kaplan-Meier  estimator.  The  asymptotic  properties  of  the  latter  estimator  have 
been  well  understood.  However,  this  practice  inevitably  introduces  biases  in  the  statistical 
analysis. 

To  study  the  asymptotic  properties  of  the  GMLE,  we  make  the  following  assumptions: 

(AS1)  The  censoring  distribution  is  discrete  but  the  survival  distribution  is  arbitrary. 

(AS2)  The  censoring  distribution  has  a  support  set  of  finitely  many  points,  but  the  survival 
distribution  is  arbitrary. 

In  our  second  year,  we  have  accomplished  several  important  tasks  for  the  GMLE  under 
both  DI  and  non-DI  conditions: 

1.  Under  the  Cl  model  or  the  C2  model,  we  have  proved  the  important  result  that  the 
GMLE  is  strongly  consistent  under  assumption  (ASl) 

2.  Under  the  Cl  model  we  have  proved  the  important  result  that  the  GMLE  is  asymptot¬ 
ically  normal  and  efficient  under  Assumption  (ASl). 

3.  Under  the  C2  model  we  have  proved  the  important  result  that  the  GMLE  is  asymptot¬ 
ically  normal  and  efficient  under  assumption  (AS2) 

4.  We  proposed  the  MIC  model  for  the  PIC  data. 

5.  Under  the  MIC  model  we  have  proved  the  important  result  that  the  SCE  and  the 
GMLE  are  strongly  consistent  under  Assumption  (ASl). 

6.  Under  the  MIC  model  we  have  proved  the  important  result  that  the  SCE  and  the 
GMLE  are  asymptotically  normal  and  efficient  under  Assumption  (ASl). 

Four  completed  manuscripts  ([7],  [11],  [12]  and  [13]),  pertaining  to  results  1  thorough  6, 
have  been  submitted  to  peer-reviewed  statistical  journals.  We  are  still  preparing  the  fifth 
paper  [14],  pertaining  to  results  1  and  3.  We  presented  some  of  our  results  at  the  Sydney 
International  Statistical  Congress,  July  8-12,  1996,  and  at  the  Joint  Statistical  Meetings: 
Institute  of  Mathematical  Statistics,  American  Statistical  Association  and  International 
Biometric  Society,  August  4-8  Chicago. 


7 


C.  BODY 
Main  Results 


C.l.  C2  Model. 

By  Assumption  (AS1),  there  are  only  countably  many  (yi,  Zj)’s,  we  can  assume  that 
they  are  {ai,a2, ...}. 

Theorem  1.  Under  the  C2  model  and  Assumption  (ASl),  the  GMLE  F(x)  converges  to 
F(x )  a.s.  for  all  x  =  a\,  i  >  1. 

Theorem  2.  Under  the  C2  model  and  Assumption  (AS2),  and  suppose  that  there  are  alto¬ 
gether  m  points  ai, ...,  am  and  that  F(a{)  >  F(aj_i)  fori  =  2,  ...,ra,  we  have,  EIeIzEIeI  JY, 
iV(0, 1)  as  n  — >■  oo  for  x  =  a i,  where  a2  is  given  in  Yu,  Schick,  Li  and  Wong  [11], 

To  see  how  close  the  approximation  is  to  the  theoretic  results,  we  present  numerical 
results  in  Table  1.  The  measure  dF  assigns  the  weight  0.2,  0.1,  0.25,  0.3  and  0.15  to  the 
point  1,  3,  5,  7  and  9,  respectively.  The  measure  dG  assigns  the  weight  0.4  and  0.6  to  the 
point  (2,6)  and  (4,8),  respectively.  In  each  simulation,  the  sample  size  of  800  was  used. 

In  the  table,  F{x)  stands  for  average  of  F  with  1000  repetitions,  SD (F(x))  for  the  sample 
standard  deviation  of  F(x)  and  a(F(x))  for  standard  deviation  of  F(x)  computed  through 
formula  given  by  Theorem  2. 


Table  1.  Standard  Deviation  of  the  GMLE 


(*) 

F(x) 

hx) 

SD(F(x)) 

a(F(x)) 

2 

0.20 

0.1996 

0.0222 

0.0224 

4 

0.30 

0.3006 

0.0207 

0.0209 

6 

0.55 

0.5512 

0.0273 

0.0278 

8 

0.85 

0.8500 

0.0165 

0.0163 

The  sample  SD’s  in  the  table  match  well  with  the  values  computed  from  the  theoretic 
limits  we  have  derived. 

C.2.  MIC  Model. 

The  investigator  proposed  for  PIC  data  the  MIC  model,  which  is  a  mixture  of  a  C2 
interval  censorship  model  and  a  right  censorship  (RC)  model  (see  Yu,  Li  and  Wong  [12]). 
The  C2  model  assumes  that  AT  is  a  non-negative  random  variable  (failure  time)  with  dis¬ 
tribution  function  F  and  (Y,  Z)  is  a  non-negative  random  vector  (censoring  interval)  with 
joint  distribution  function  G(u,v).  It  further  assumes  that  Y  <  Z  with  probability  one 

(w.p.l),  and  that  X  and  (Y,Z)  are  independent.  The  RC  model  assumes  that  there  is  a 

random  censoring  time  T,  with  distribution  function  Gt,  which  is  independent  of  X,  and 
the  information  observed  from  the  RC  model  is  (min(X,  T),  I(X  <  T)).  We  introduce  a 
random  variable,  D ,  to  distinguish  failure  times  coming  from  the  two  models: 

D  —  f  1  if  the  observation  is  from  the  RC  model 

l0  if  the  observation  is  from  the  C2  model. 

Let  P{D  =  1}  =  7r,  where  0  <  7r  <  1.  Formally,  a  PIC  data  point  is  regarded  as  an 
observation  from  the  RC  model  w.p.7r  and  from  the  C2  model  w.p.  1  —  7r. 
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To  express  observed  PIC  data  as  intervals,  we  introduce  a  notation  [ L ,  R\  defined  as 
follows: 


f 


L L,  i*J  = 


[o,n 

[Y,Z) 
[Z,  oo) 
(T,  oo) 

[X,X\ 


if  D  =  0  and  X  <  Y 
if  D  =  0  and  Y  <  X  <  Z 
if  D  =  0  and  X  >  Z 
if  ZD  =  1  and  X  >  T 
if  ZD  =  1  and  X  <T, 


where  [X,  X]  is  an  exact  observation.  Let  ( Li,Ri ),  i  —  1, 2, . . .  ,n  be  a  random  sample 
from  the  random  vector  ( L,R )  with  common  joint  distribution  function  Q(l,  r ),  and  [Z,  -rj  a 
realization  of  [L,R\.  We  say  that  the  PIC  data  [L,  R\  are  from  a  mixture  interval  censorship 
model,  called  the  MIC  model. 

Define  r  =  sup{Z;  P{min(X,  T)  <  t}  <  1}  and  tz  =  sup{Z;  P{Z  <  t}  <  1}.  We 
assume  that  r  >  rz,  which  is  imposed  throughout  the  paper.  This  assumption  is  reasonable 
since  under  the  RC  model  [0,  r]  represents  the  whole  time  period  of  a  follow-up  study. 

Define  Oa  =  {a;;  P(X  is  not  censored|X  =  x )  >  0}.  Let  Oc  =  r=00|y>rJ>  the 

intersection  of  all  observed  intervals  with  right  endpoint  infinity,  and  O  =  [0,  oo )\Oc,  where 
“\”  is  the  set  minus.  For  PIC  data,  it  can  be  shown  that  O  C  [0,r].  Whether  O  =  [0,r]  or 
not  depends  on  F ,  G  and  Gt •  To  take  the  right  endpoint  r  into  account,  recall  that  under 
the  RC  model  the  strong  consistency  of  the  Kaplan-Meier  estimator  at  r  requires  either 
F(t—)  =  1  or  P{T  =  r}  >  0  (cf.  Yu  and  Li  [15]  p.416).  Since  the  MIC  model  includes  the 
RC  model  as  a  special  case,  a  similar  assumption  is  needed  and  is  given  as  follows. 

(AS3)  Either  P{X  6  O}  =  1  or  P{L  =  r}  >  0. 

Theorem  3.  Under  (ASl)  and  (ASS),  the  SCE  F(x )  satisfies  that 

limn^oo  supxe0  I F(x)  -  P(x)|  =  0  a.s. 

To  establish  asymptotic  normality  for  the  SCE,  we  need  an  additional  assumption  on 
the  distribution  function,  namely, 

(AS4)  P{X  G  Ii  fl  Ij)  >  0  for  any  two  realizations,  /;  and  Ij,  of  \L,R\,  provided 
Ii  n  Ij  7^  0. 

Theorem  4.  Under  Assumptions  (ASl),  (AS3)  and  (AS4),  the  SCE  F(x)  satisfies  that  for 

x  G  O,  IXeAzZUI  A(0, 1)  as  n  -¥  oo,  where  the  notations  are  the  same  as  in  Theorem 

2. 


We  apply  Theorem  4  to  the  breast  cancer  data  in  Example  1  to  obtain  the  SCE  and 
its  asymptotic  variance  for  the  survival  function  S(t),  which  represents  the  proportion  of 
women  who  were  relapse  free  at  time  t.  Figure  1  gives  the  survival  plot  together  with  the 
95%  asymptotic  confidence  bands. 
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Fig.1 .  Self-Consistent  Estimate  for  Breast  Cancer  Data 


o 


C.3.  Cl  Model. 

By  Assumption  (AS1),  there  are  only  countably  many  (?/»,  Zj)’s,  WLOG,  we  can  assume 
that  they  are  {01,02, ...}. 

Theorem  5.  Under  the  Cl  model  and  Assumption  (AS1),  the  GMLE  F(x )  converges  to 
F(x )  a.s.  for  all  x  =  ai,  i>  1. 

Theorem  6.  Under  the  Cl  model  and  Assumption  (AS1),  and  suppose  that  F(z)  >  F(x)  > 
F{y )  for  z,x,y  e  {oi}i>i  and  z  <  x  <  y;  and  there  is  no  other  ai  €  ( z,y )  other  than  x. 

Then  we  have,  N(0, 1)  as  n  — >•  00  for  x  =  di,  where  g(x)  =  P{X  =  rr}. 


D.  CONCLUSIONS 

As  we  point  out  in  INTRODUCTION,  interval-censored  data  are  commonly  encounted 
in  cancer  follow-up  studies  and  there  has  been  a  lack  of  asymptotic  estimation  procedures 
for  the  survival  function.  In  our  second  year  of  research,  we  have  derived  the  asymptotic 
distribution  for  the  GMLE  under  Assumption  (AS1)  or  (AS2).  In  the  BODY  section,  we 
have  used  our  asymptotic  results  for  the  MIC  model  to  produce  the  survival  curve  and  its 
95%  confidence  band  plots  for  overall  relapse  free  survival  for  interval-censored  data  from 
374  women  with  stages  I,  II  and  III  breast  cancer  after  treatment  by  surgery. 
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Our  immediate  research  goals  for  the  third  year  are  to  extend  the  results  established  here 
to  the  case  that  the  distribution  functions  are  more  general  than  those  in  assumptions  (ASl) 
and  (AS2).  Specifically,  we  will  extend  the  method  to  obtain  the  asymptotic  distribution 
of  the  GMLE  under  (ASl)  or  (AS2)  to  the  general  case  in  which  the  distribution  functions 
are  arbitrary.  We  expect  these  extensions  to  be  statistically  fairly  challenging. 
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(1)  Institute  of  Mathematical  Statistics  Meeting  no.  2f6  8-12  July  1996  Sydney,  N.S.W. 


TITLE:  Variance  of  the  MLE  of  a  Survival  Function  with  Interval  Censored  Data 
Qiqing  /Yu  ,  Linxiong  /Li  and  George  Y.  C.  /Wong 

SUNY  at  Binghamton,  University  of  New  Orleans  and  Strang  Cancer  Preventive  Insti¬ 
tute 

ABSTRACT:  Interval-censored  data  consist  of  n  pairs  of  observations  ( /*,  r* ),  i  = 
1,  ...,n,  where  li  <  ri.  We  either  observe  the  exact  survival  time  X  if  li  =  rj  or  only  know 
X  €  (h,  r{)  otherwise.  We  established  the  asymptotic  normality  of  the  nonparametric  MLE 
of  a  survival  function  S(t)  (=  P(X  >  t)  with  such  interval-censored  data  and  present 
an  estimate  of  the  asymptotic  variance  of  the  MLE.  We  show  that  the  convergence  rate 
in  distribution  is  in  \Jn.  Simulation  study  also  supports  our  result.  An  application  to  the 
cancer  research  is  presented. 
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[  Corresponding  author:  Qiqing  Yu,  Math  department,  SUNY  at  Binghamton,  NY  13902, 
qyu@math.binghamton.edu  ] 

Paper  presented  in  person,  contributed  paper. 


(2)  Institute  of  Mathematical  Statistics  Meeting  no.  2f7  f-8  August  1996  Chicago,  Illinois 

TITLE:  ESTIMATION  OF  A  SURVIVAL  FUNCTION  WITH  CASE  1  INTERVAL 
-CENSORED  DATA 

Qiqing  /Yu  ,  Anton  /Schick,  Linxiong  /Li  and  George  Y.  C.  /Wong 
SUNY  at  Binghamton,  University  of  New  Orleans  and  Strang  Cancer  Preventive  Insti¬ 
tute 

ABSTRACT:  Case  1  interval  censored  data  consists  of  either  right-censored  data  or 
left-censored  data  but  not  exact  observations.  Let  F(x )  and  G(y)  be  the  distribution  func¬ 
tions  of  the  survival  time  X  and  censoring  time  Y,  respectively.  Groeneboom  and  Wellner 
(1992,  p.100)  (G&W)  establish  the  consistency  and  asymptotic  distribution  of  the  MLE  of 
F  under  the  assumption  that  F'{x )  and  G'(y)  are  both  positive  and  continuous.  Under  the 
assumption  that  X  is  arbitrary,  but  Y  takes  on  finitely  many  values,  we  establish  the  consis¬ 
tency,  asymptotic  normality  and  efficiency  of  the  MLE  of  F(y)  at  the  observations,  y,  of  Y, 
and  present  a  consistent  estimate  of  the  asymptotic  variance  of  the  MLE.  The  convergence 
rate  in  distribution  is  n1/3  under  G&W’s  assumption,  but  it  is  \fn  under  our  assumption. 
Simulation  results  indicates  that  the  sample  variance  is  very  close  to  the  theoretical  value 
of  the  asymptotic  variance  given  in  our  paper,  even  for  a  sample  size  of  100. 

[  Corresponding  author:  Qiqing  Yu,  Math  department,  SUNY  at  Binghamton,  NY  13902, 
qyu@math.binghamton.edu  ] 

Paper  presented  in  person,  contributed  paper. 
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